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Abstract 

We construct an algorithm that generates large, band-diagonal transition matrices for a 
totally asymmetric exclusion process (TASEP) with local hopping rate inhomogencitics. The 
matrices are diagonalized numerically to find steady-state currents of TASEPs with local vari- 
ations in hopping rate. The results are then used to investigate clustering of slow codons along 
mRNA. Ribosomc density profiles near neighboring clusters of slow codons interact, enhancing 
suppression of ribosome throughput when such bottlenecks are closely spaced. Increasing the 
slow codon cluster size, beyond ~ 3 — 4, does not significantly reduce ribosome current. Our 
results are verified by extensive Monte-Carlo simulations and provide a biologically-motivated 
explanation for the experimentally-observed clustering of low-usage codons. 
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During protein synthesis, ribosome molecules initiate at the 5' end of messenger RNA (mRNA), 
scan ("elongate") along the mRNA sequence, and terminate with the completed protein product at 
the 3' termination end (Fig. ^i). Each elongation step requires reading (translating) a nucleotide 
triplet (codon) and the binding of a freely diffusing transfer RNA (tRNA) molecule carrying the 
amino acid specific to each codon pp. Besides being a critical final stage of gene expression in vivo, 
control of protein synthesis is vital protein adaptation and evolution [21 |3J Q] , in control of viral 
parasitism (5], and in the biotechnology of high yield, cell-free, synthetic in vitro protein production 

mm 

The near-final stage of protein expression can be regulated by exploiting local variations in 
ribosome detachment and elongation rates (arising from different codon usages for any given amino 
acid). Relative concentrations of tRNA in solution can be a factor in determining local ribosome 
translation rates. This fact has been exploited in the modification of green fluorescent protein 
codons to match those preferred in mammalian systems, thereby optimizing expression levels [SJ. 
Binding of an incorrect tRNA can also temporarily prevent binding of the correct tRNA for a 
particular codon, slowing down elongation [T]. Regions of higher ribosome density can also protect 
the substrate mRNA from the action of enzymes which arrest translation. Finally, local pauses 
may provide time for regions of the peptide to sequentially fold before the integration of additional 
amino acids. 

The occurrence of "slow" codons (those with rare corresponding tRNA and/or amino acids), 
along mRNA is known to inhibit protein production [H1EI1> and is conjectured to serve a regulatory 
role in protein synthesis. Such "bottleneck" or "defect" codons (and the amino acids for which they 
code) arise infrequently, and typically include CTA (Leu), ATA (He), ACA (Thr), CCT and CCC 
(Pro), CGG, AGA, and AGG (Arg) ^2^]. Statistics indicate a higher occurrence of rare codons 
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Figure 1: (a) The mRNA translation/protein synthesis process. Multiple ribosomes (polysomes) 
move unidirectionally along mRNA as tRNA (not shown) deliver the appropriate amino acid to 
the growing chain. Codons with low concentrations of corresponding tRNA result in bottlenecks 
that locally suppress ribosome motion across it. (b) A simple totally asymmetric exclusion process 
on N lattice sites used to model mRNA translation in the presence of slow codons bottlenecks, or 
"defects." 

near the 5' initiation site of E. coli genes Even more striking is the bias for rare codons to 

cluster . Statistical analyses show small (2-5) clusters of rare codons occur frequently in E. coli, 
Drosophilia, yeast, and primates [TT1 IT1| . 

Although the strength, number, and positioning of bottlenecks along an mRNA chain undergo- 
ing translation clearly affect local ribosome densities and overall translation rates, there has been 
no quantitative model describing how various bottleneck motifs control ribosome throughput and 
protein synthesis. In this paper, we consider a simple physical model of how specific codon us- 
ages that give rise to local delays in elongation can be used to suppress protein synthesis. Our 
results provide biophysical hypotheses for slow codon clustering within a simple nonequilibrium 
steady-state stochastic framework. Ribosome density boundary layers surrounding two nearby 
bottlenecks, or defects are shown to act synergistically to regulate and amplify the suppression of 
protein production. 

We model mRNA translation by ribosome particles using a nonequilibrium totally asymmetric 
exclusion process (TASEP) |151 116j with a few carefully distributed slow sites (cf. Fig. In 
the TASEP, ribosome particles attach (initiate) at the first lattice site with rate a, only if the first 
site is empty. Interior ribosome particles can move forward with rate pi from site i to site i + 1 
only if site i + 1 is empty. For each step a ribosome moves forward, a codon is read, and an amino 
acid-carrying tRNA delivers its amino acid to the growing polypeptide chain. No motion is allowed 
if the site in front of a particular ribosome is occupied. Each ribosome that reaches the last site 
i = N (the 3' termination site) has polymerized a complete protein and detaches with rate (3. 

In the case of uniform pi = 1, the protein production rate {e.g. the steady-state ribosome 
current) and ribosome density along the mRNA are known exactly in terms of a and (3 |15l I16| . 
In the long chain (N — > oo) limit, the steady-state results reduce to simple forms illustrating the 
fundamental physical regimes. The current may be entry-rate limited (a < 1/2, a < (3), where 
the ribosome density is low and the steady-state current J(a) = a(l — a) depends only on a. If 
(3 is sufficiently small ((3 < 1/2,(3 < a), the density is high and the current J((3) = (3(1 — (3) is a 
function of the rate- limiting exit step only. When both a, (3 > 1/2, the rate-limiting processes are 
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the uniform internal hopping rates, and the steady-state current reaches its maximum attainable 
value J = 1/4. These three regimes define a phase diagram or the steady-state current of the 
TASEP. For a typical mRNA length of ~ 100 — 1000 codons, these simple analytic forms for J are 
extremely accurate. For the analyses in the absence of ribosome detachments, we shall restrict our 
attention to effectively the N ^ oo limit. 
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Figure 2: Placements of slow defects, or rare-usage codons (thick red segments), (a) Two defects 
near the initiation site straddled by an n = 5 lattice segment, (b) Two defects in the chain interior, 
away from the boundaries, n = 6. (c) A single slow defect near the termination end of the chain, 
n = 4. 

There is no general theory for computing steady-state particle (ribosome) currents when the 
internal hopping rates pi vary arbitrarily with lattice position i. However, specific motifs {pi} (such 
as isolated defects and periodic variations) can be treated with approximations and simulation. 
Since slow codons are also rare, with typical densities of 0.03, we will consider simple configurations 
of few, identical bottlenecks (with hopping rate q < 1) distributed within an otherwise uniform (with 
rate 1) lattice. Finite numbers of "fast" defects with q > 1 do not affect steady-state currents since 
the rate limiting hops pi = 1 dominate the lattice. Figures E(a-c) show hypothetical placements of 
defects near the 5' initiation end, the mRNA interior, and the 3' termination end. Within a finite 
segment (of length n = 5, 6, 4 sites in Figs. |2 respectively) straddling the bottlenecks, we explicitly 
enumerate all 2 n distinct states according to the algorithm indicated in Fig. |21 This generates a 
band- diagonal (of width n) 2 n x 2 n matrix coupling the probabilities Pj, (1 < j < 2 n ) that the 
segment is in state j. 

Now consider an interior segment (Fig. I2fb)). The mean density in the site immediately to the 
left(right) of the segment is denoted cr_(o"+). The transition matrix contains the parameters o~± 
since entry and exit into the enumerated segment is proportional to ct_ and 1 — <r+, respectively. 
The steady-state current is calculated from the appropriate elements of the singular eigenvector 
associated with the zero-eigenvalue of the transition matrix. For example, the particle flux out 
of the rightmost site of the segment is J(cj_, cj + , {pi}) = (1 — o- + )^2j =odd Pj(a~,a + ,{pi}), since 
the odd states correspond to those with a particle at the last site in the segment. The singular 
eigenvector of the 2™ x 2 n , sparse, band-diagonal transition matrix was computed (up to n ~ 18) 
using an implicit restarted Arnoldi iteration method JH] found in Matlab. The effects of the 
perturbed particle correlations surrounding a bottleneck are accounted for provided n is larger 
than the density boundary layer thickness. Since the densities are uniform far enough away from 
the defects, we assume that they are also o--(o~+) far to the left(right) of the segment. Thus, the 
mean-field currents well to the left and right of the segment are J_ = cr_(l — cr_) = J + = a + {\ — <r+). 
The only physical solution is <r_ = 1 — a + . Equating J(cr_ = 1 — cr + ,a + , {pi}) = <r+(l — <7+), and 
solving for cr+ numerically, we find the ribosome current J. For a < 1/2 and defects near the 
initiation end, we simply equate J(a,a + ,{pi}) = <r+(l — <r+) and solve for a + . For (3 < 1/2 and 
defects near the termination end, cr_ is determined from <r_(l — <r_) = J(cr_, /?, {pi})- We used this 
improved, systematic finite-segment mean-field theory (FSMFT) to compute currents of TASEPs 
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Figure 3: The algorithm that generates the transition matrix for a TASEP. For illustration, consider 
a three site model. Each possible occupancy of the lattice is associated with a bit pattern, and the 
state is enumerated with the corresponding decimal value. For example given that Oil is the binary 
representation of 3, we label the state with particles occupying the second and third lattice sites 
as state 3. Next we divide the states into two groups; states where the first lattice site is occupied 
(1-states), and states where the first lattice site is empty (0-states). Regardless of the number of 
lattice sites in the TASEP, the transitions between the two classes of states always occur between 
the first half of the 1-states and the second half of the 0-states (dashed red arrows). To determine 
the remaining transitions we call the algorithm recursively on both the 1-states and the 0-states, 
making sure to ignore the highest order bit (i.e. the leftmost lattice site). Finally we add in the 
trivial transitions between each 0-state and each 1- state that are the result of injection at the left 
edge of the lattice (blue arrows). With the connectivity of the states fully enumerated, assigning 
the appropriate rate to each of the transitions is straightforward. 



with representative placements of internal defects. In practice, segment lengths that include only 
2-3 sites on each side of a defect were sufficient for obtaining extremely accurate results. Efficient 
continuous-time Monte-Carlo simulations using the Bortz-Kalos-Lebowitz (BKL) algorithm |19l I2U| 
were performed on lattices of size N > 10000 to verify and extend all our results. 

MC simulations show that for a, (3 > 1/2, the currents J are insensitive to the position of 
defects. The behavior for a, (3 < 1/2 resembles an interior defect near the initiation or termination 
end. Slow initiation and/or termination rates can be effectively described by defects (q < 1) near 
the ends of the lattice. Therefore, we can restrict our analysis to large a, (3 ^> 1/2. Hopping across 
a single interior defect (with rate q < 1) is the overall rate-limiting step. For this single defect, 
the reduced steady-state current Ji(q), found from both FSMFT and MC simulations are shown in 
Fig. 0(a). The n = 4 FSMFT yields currents within 2% of those computed from MC simulations. 
The least-accurate n = FSMFT gives J = q/(q + l) 2 and is equivalent to previous treatments of 
a single defect |17j . Larger segments n yield increasingly unwieldy algebraic expression for J\{q). 
The FSMFT (which is exact if n = N) is asymptotically correct for q — » (0, 1) and is a systematic 
expansion in J = Yl^i a iQ l about q = 0. Coefficients up a n+ ± can be shown to be given correctly 
by a 2n-segment MFT, i.e., for n > 1, J ~ q - 3q 2 /2 + 0(q 3 ) rather than J ~ q - 2q 2 + 0(q 3 ) 
predicted by the 0-segment MFT [T7| . 

The current J can be further reduced upon adding successive, contiguous defects. The steady- 
state current across m equivalent, contiguous interior bottlenecks can be shown to have the analytic 
expansion 

Jrn(q^0)~(^±^q + O(q 2 ). (1) 
As m increases, the current is determined by the rate-limiting segment which resembles a uniform 
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Figure 4: (a) Comparison of steady-state currents J\{q) for a single defect derived from MC sim- 
ulations with those obtained from n-segment MFT. For n > 4, the FSMFT results are within 2% 
of those from MC simulations. The boundary injection/extraction rates were not rate limiting 
(a = (3 = 10). (b) Further reduction of steady-state current as successive, identical defects are 
added. The first few defects cause most of the reduction in current. 

chain of hopping rates q with relatively fast injection from, and extraction into, the remainder of the 
chain with hopping rates pi = 1. Thus, the current approaches that of a long, uniform chain with 
hopping rates q in the maximal current regime: J = q/4. Figure^Jb) plots J m (q)/q for various q as 
a function of m defects. A 14-segment MFT was used in to generate the results and was checked to 
provide high numerical accuracy up to m ~ 12. For all q < 1, J m (q)/q approaches 1/4 as m — > N, 
with the largest changes occurring for small m and q. The leading order approximation in (pQ) is 
very accurate (compared to FSMFT or MC simulations) for q as large as 0.1. 

For strong bottlenecks with q < 0.3, the largest reduction in J m (q) occurs as m = 1 — > 2. 
Therefore, one may consider the effects of placing only two bottlenecks in the mRNA interior. Fig. 
Efa) shows the expected currents J2(q] k) across a chain containing two defects spaced k sites apart. 
Results for k < 10 were computed using an n = 1 4 FSMFT, while those for k > 10 were obtained 
from MC simulations. The largest reduction in the current occurs when two defects are spaced 
as closely as possible. The current J2(q;k — > oo) — > J\{q) eventually approaches that for a single 
defect. A finite number of multiple defects, if spaced far apart, will not significantly decrease J 
relative to the case of a single defect. As k increases, the density downstream of the first defect 
recovers to the bulk value <7_ before encountering the second defect. This behavior is clearly shown 
(using both FSMFT and MC simulations) in Fig. E{b) for a pair of defects (q = 0.15). For two 
identical defects with small q we find 



Therefore, the current for two identical bottlenecks is at most a factor of 2 smaller than that for a 
single defect. This maximal contrast occurs when q — > and when the two bottlenecks are adjacent 
to each other. Fig. E[c) plots the variation in J2{q\ k)/J\(q) as a function of k for various q. 

Typically, a cell has a fixed number of rare codons with which to implement different translation 
regulation strategies. From the two-defect current J2, we estimate the expected current for m 
randomly distributed bottlenecks. The current will be reduced to a value less than or equal to 
J2(q;k) only when two or more defects are spaced within k sites. The total number of ways m 
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Figure 5: (a) Steady-state currents across a chain with two identical defects in the interior spaces 
by k lattice sites. The current is suppressed most when the defects are closely spaced, (b) The 
density profiles centered at the midpoint between two defects (q = 0.15) of various spacings k. The 
thick vertical bars denote the defect positions for k = 6. For larger k, density boundary layers heal, 
allowing particle build-up behind the second barrier, enhancing the current, (c) The dependence 
of the normalized steady-state current J2(q] k)/ J\(q) as a function of the defect separation. 

defects can be placed on N — 1 interior sites, such that the minimum pair spacing is k is equivalent 
to the "Tonk's gas" partition function for m particles of integer length k distributed on a lattice 
with N - 1 + k sites $ZT\ : 

-( w - 1 - (m m - 1)( *- 1) ). (3) 

The probability density that the minimum inter-defect spacing equals k is thus 

Qfc(m ' N) = Z^N) • (4) 

The probability density Qk(m, 300) is shown in Fig. HJa) for m = 4,8, 12, 16. For small number 
of defects m, the probability density is broad in k, indicating the large number of separations k 
uniformly accessible to any of the few pairs. As m increases, the likelihood of any pair being close 
increases, pushing Qk(m,N) to peak at small k. 

The maximum current for any configuration with a minimum defect spacing k is J2(<?; k). Thus, 
an upper bound for the randomness-averaged current is 

int{(7V-l)/(m-l)} 

(J)ran< £ Q k (m, N)J 2 (q; k) . (5) 
k=l 

This result will be very accurate if the defect density is low enough that one can neglect the 
probability that more than two defects each separated by k sites form a single cluster, particularly 
for small k. Although the most likely minimum defect spacing is k = 1, at low defect densities, the 
total probability of closely-spaced defects remains small and the weight of Qk{m,N) at larger k 
dominate the statistics of (J) ra n- The disorder-averaged current (J(m,N = 300)) mn (normalized 
by Ji(q)) is shown in Fig. E^b). For very small m, the current is approximately that of a single 
defect. The current is most sensitive to the number of random defects at m ~ 10, corresponding 
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Figure 6: (a) The probability Qk{m,N) that the minimum separation of any two defects (of a 
total of m on an N site lattice section) is k (Eqn. (b) The upper bound for the mean current 
of a lattice with m defects randomly distributed within the central N = 300 sites (Eqn. ©)• 
This J\ ((^-normalized, randomness-averaged upper bound includes suppression only from pairwise 
defect configurations. A lower bound as m — * N — 1 is ( J(m — > N — l)) ran ~~ * ?/4. (c). The effects 
of a small, uniform detachment rate ujd at each site. Detachments suppress ribosome throughput 
only in the entry-limited (a = 0.1, (3 = 10) and maximal current (a = (3 = 10) regimes. 

to m/N ~ 0.03, approximately the fraction of slow codons observed in vivo. Nonetheless, the 
observation of enhanced, nonrandom clustering suggests that other biological regulation pathways 
exist and would yield currents measurably below the upper bound in ©. As m — ► N — 1, large 
clusters of defects dominate that are not accounted for in Qk(m,N), and the current approaches 
q/4. 

Finally, we consider the effects of a small uniform ribosome detachment at each site with rate 
iOd- Monte-Carlo simulations show in Fig. Etc) that the steady-state current is diminished only 
when the TASEP is in what normally would be the entry-limited (small a, large (3) and maximal 
current (large a, (3) regimes. When translation is initiation-limited, detachments occur for particles 
that have already passed the rate-limiting initiation stage, so they no longer contribute to the final 
current. In the maximal current regime, interior particles are also detaching, thus reducing the 
number of particles that contribute to the current. In the exit-limited regime (large a, small (3), 
detachments also decrease the overall ribosome density along most of the mRNA. However, if u>d is 
small (as in Fig. Et c )), termination remains the rate-limiting step, ribosomes are still able to pile 
up against the termination site, and the final ribosome throughput J = [3(Tn remains relatively 
unchanged. In the termination rate-limited regime, although the final current is not significantly 
suppressed by small u^,, the high ribosomal density along the mRNA results in a large current off 
the mRNA and may be metabolically wasteful. 

We have found that not only can a single defect directly inhibit elongation across it, but a 
few bottlenecks, properly distributed, can further slow protein production by a factor of ~ 2 — 4. 1 
Depending on the targeted biological outcome, qualitatively distinct strategies can be employed. 

1 Ribosomes are structurally larger than nucleotide triplets, occluding w ~ 10 codons. Although our FSMFT 
algorithm cannot be easily adapted for large particles, the steady-state properties of a w > 1 TASEP 1201 IHil are 
qualitatively the same as the standard TASEP were each particle occupies exactly one lattice site, as verified by 
extensive MC simulations. However, the finite ribosome size prevents defects spaced k < w codons apart to influence 
each other. 
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Although maximal current reduction is achieved by clustering defects as tightly as possible, succes- 
sive addition beyond a handful of defects does little to reduce the current. Defects which are all are 
spaced more than a handful of sites apart will not reduce the throughput more than a single defect. 
These results are qualitatively consistent with a single, localized region providing the rate limiting 
step(s) for translation. Since initiation, which requires assembly of numerous ribosome parts, is 
typically rate limiting, the observation of slow codons near the start codon ^3] (forming the equiv- 
alent of two closely-spaced defects near the start) suggests an expression-inhibiting, regulatory role 
for these initiation-end bottlenecks. Conversely, a finite number of well-separated defects at appro- 
priate junctures provide pause points for local, successive protein folding with minimal reduction in 
current. Additional processes such as detachment, when combined with upstream or downstream 
bottlenecks can also regulate ribosome throughput. 

The authors thank L. Shaw and A. B. Kolomeisky for valuable discussions. TC acknowledges 
support from the NSF (DMS-0206733), and the NIH (R01 AI41935). 
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